The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.
It is important to maintain a diversified portfolio when investing in stocks in order to maximize earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones that exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.
Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They need to analyze the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.
# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd
# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# to scale the data using z-score
from sklearn.preprocessing import StandardScaler
# to compute distances
from scipy.spatial.distance import cdist
# to perform k-means clustering and compute silhouette scores
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# to visualize the elbow curve and silhouette scores
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
# to compute distances
from scipy.spatial.distance import pdist
# to perform hierarchical clustering, compute cophenetic correlation, and create dendrograms
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage, cophenet
# to perform PCA
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')
# loading the dataset
data = pd.read_csv("stock_data.csv")
# Make a copy of train dataset
df = data.copy()
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 200)
# viewing a random sample of the dataset
np.random.seed(1)
data.sample(200)
| Ticker Symbol | Security | GICS Sector | GICS Sub Industry | Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P/E Ratio | P/B Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 102 | DVN | Devon Energy Corp. | Energy | Oil & Gas Exploration & Production | 32.000000 | -15.478079 | 2.923698 | 205 | 70 | 830000000 | -14454000000 | -35.550 | 4.065823e+08 | 93.089287 | 1.785616 |
| 125 | FB | Information Technology | Internet Software & Services | 104.660004 | 16.224320 | 1.320606 | 8 | 958 | 592000000 | 3669000000 | 1.310 | 2.800763e+09 | 79.893133 | 5.884467 | |
| 11 | AIV | Apartment Investment & Mgmt | Real Estate | REITs | 40.029999 | 7.578608 | 1.163334 | 15 | 47 | 21818000 | 248710000 | 1.520 | 1.636250e+08 | 26.335526 | -1.269332 |
| 248 | PG | Procter & Gamble | Consumer Staples | Personal Products | 79.410004 | 10.660538 | 0.806056 | 17 | 129 | 160383000 | 636056000 | 3.280 | 4.913916e+08 | 24.070121 | -2.256747 |
| 238 | OXY | Occidental Petroleum | Energy | Oil & Gas Exploration & Production | 67.610001 | 0.865287 | 1.589520 | 32 | 64 | -588000000 | -7829000000 | -10.230 | 7.652981e+08 | 93.089287 | 3.345102 |
| 336 | YUM | Yum! Brands Inc | Consumer Discretionary | Restaurants | 52.516175 | -8.698917 | 1.478877 | 142 | 27 | 159000000 | 1293000000 | 2.970 | 4.353535e+08 | 17.682214 | -3.838260 |
| 112 | EQT | EQT Corporation | Energy | Oil & Gas Exploration & Production | 52.130001 | -21.253771 | 2.364883 | 2 | 201 | 523803000 | 85171000 | 0.560 | 1.520911e+08 | 93.089287 | 9.567952 |
| 147 | HAL | Halliburton Co. | Energy | Oil & Gas Equipment & Services | 34.040001 | -5.101751 | 1.966062 | 4 | 189 | 7786000000 | -671000000 | -0.790 | 8.493671e+08 | 93.089287 | 17.345857 |
| 89 | DFS | Discover Financial Services | Financials | Consumer Finance | 53.619999 | 3.653584 | 1.159897 | 20 | 99 | 2288000000 | 2297000000 | 5.140 | 4.468872e+08 | 10.431906 | -0.375934 |
| 173 | IVZ | Invesco Ltd. | Financials | Asset Management & Custody Banks | 33.480000 | 7.067477 | 1.580839 | 12 | 67 | 412000000 | 968100000 | 2.260 | 4.283628e+08 | 14.814159 | 4.218620 |
| 117 | ETR | Entergy Corp. | Utilities | Electric Utilities | 68.360001 | 4.910983 | 1.217401 | 2 | 44 | -71065000 | -156734000 | -0.990 | 1.583172e+08 | 18.456543 | 6.174024 |
| 230 | NSC | Norfolk Southern Corp. | Industrials | Railroads | 84.589996 | 9.529966 | 2.168814 | 13 | 49 | 128000000 | 1556000000 | 5.130 | 3.033138e+08 | 16.489278 | 0.926433 |
| 123 | F | Ford Motor | Consumer Discretionary | Automobile Manufacturers | 14.090000 | 2.398256 | 1.151454 | 26 | 43 | 3515000000 | 7373000000 | 1.860 | 3.963978e+09 | 7.575269 | 5.108756 |
| 161 | HST | Host Hotels & Resorts | Real Estate | REITs | 15.340000 | -3.217666 | 1.594628 | 8 | 47 | -445000000 | 558000000 | 0.220 | 2.536364e+09 | 69.727273 | -0.113548 |
| 4 | ADI | Analog Devices, Inc. | Information Technology | Semiconductors | 55.320000 | -1.827858 | 1.701169 | 14 | 272 | 315120000 | 696878000 | 0.310 | 2.247994e+09 | 178.451613 | 1.059810 |
| 246 | PFE | Pfizer Inc. | Health Care | Pharmaceuticals | 32.279999 | 3.130991 | 1.238748 | 11 | 79 | 298000000 | 6960000000 | 1.130 | 6.159292e+09 | 28.566371 | -4.213309 |
| 290 | TGNA | Tegna, Inc. | Consumer Discretionary | Publishing | 25.520000 | 13.624226 | 1.797269 | 21 | 21 | 10716000 | 459522000 | 2.040 | 2.252559e+08 | 12.509804 | -12.726553 |
| 275 | SNI | Scripps Networks Interactive Inc. | Consumer Discretionary | Broadcasting & Cable TV | 55.209999 | 12.238260 | 1.773865 | 40 | 23 | -654720000 | 606828000 | 4.680 | 1.296641e+08 | 11.797008 | -7.961579 |
| 286 | SYK | Stryker Corp. | Health Care | Health Care Equipment | 92.940002 | -1.650792 | 1.138163 | 17 | 116 | 1584000000 | 1439000000 | 3.820 | 3.767016e+08 | 24.329843 | 7.026782 |
| 219 | MUR | Murphy Oil | Energy | Integrated Oil & Gas | 22.450001 | -8.591197 | 2.851180 | 43 | 27 | -910125000 | -2270833000 | -13.030 | 1.742773e+08 | 28.407929 | -1.298006 |
| 305 | UNM | Unum Group | Financials | Diversified Financial Services | 33.290001 | 3.804181 | 1.102848 | 10 | 117 | 10400000 | 867100000 | 3.510 | 2.470370e+08 | 9.484331 | -4.178927 |
| 119 | EXC | Exelon Corp. | Utilities | MultiUtilities | 27.770000 | -6.403775 | 1.351595 | 9 | 74 | 4624000000 | 2269000000 | 2.685 | 2.998871e+08 | 17.313076 | -1.715880 |
| 251 | PM | Philip Morris International | Consumer Staples | Tobacco | 87.910004 | 10.328820 | 0.861453 | 52 | 22 | 1735000000 | 6873000000 | 4.420 | 1.554977e+09 | 19.889141 | -1.418027 |
| 6 | ADS | Alliance Data Systems | Information Technology | Data Processing & Outsourced Services | 276.570007 | 6.189286 | 1.116976 | 30 | 25 | 90885000 | 596541000 | 8.910 | 6.695185e+07 | 31.040405 | 129.064585 |
| 337 | ZBH | Zimmer Biomet Holdings | Health Care | Health Care Equipment | 102.589996 | 9.347683 | 1.404206 | 1 | 100 | 376000000 | 147000000 | 0.780 | 1.884615e+08 | 131.525636 | -23.884449 |
| 122 | EXR | Extra Space Storage | Real Estate | Specialized REITs | 88.209999 | 13.922251 | 1.186059 | 19 | 39 | 28136000 | 394950000 | 1.580 | 2.499684e+08 | 55.829113 | -14.151445 |
| 288 | TAP | Molson Coors Brewing Company | Consumer Staples | Brewers | 93.919998 | 13.129368 | 1.217803 | 5 | 35 | -150300000 | 359500000 | 1.940 | 1.853093e+08 | 48.412370 | -25.385129 |
| 250 | PHM | Pulte Homes Inc. | Consumer Discretionary | Homebuilding | 17.820000 | -5.564393 | 1.694751 | 10 | 25 | -533785000 | 494090000 | 1.380 | 3.580362e+08 | 12.913043 | -0.307832 |
| 315 | VRTX | Vertex Pharmaceuticals Inc | Health Care | Biotechnology | 125.830002 | 21.928300 | 2.456535 | 59 | 221 | 89509000 | -556334000 | -2.310 | 2.408372e+08 | 39.602928 | 2.559671 |
| 267 | RRC | Range Resources Corp. | Energy | Oil & Gas Exploration & Production | 24.610001 | -25.106512 | 3.712995 | 26 | 0 | 23000 | -713685000 | -4.290 | 1.663601e+08 | 93.089287 | 0.525090 |
| 197 | MAC | Macerich | Real Estate | Retail REITs | 80.690002 | 4.183351 | 1.169328 | 10 | 47 | 1603000 | 487562000 | 3.080 | 1.582994e+08 | 26.198053 | -3.973395 |
| 177 | JPM | JPMorgan Chase & Co. | Financials | Banks | 66.029999 | 8.033377 | 1.130337 | 10 | 99 | -7341000000 | 24442000000 | 6.050 | 4.040000e+09 | 10.914049 | -1.886881 |
| 228 | NLSN | Nielsen Holdings | Industrials | Research & Consulting Services | 46.599998 | 4.931317 | 1.198493 | 13 | 21 | 84000000 | 570000000 | 1.550 | 3.677419e+08 | 30.064515 | -12.375526 |
| 107 | EIX | Edison Int'l | Utilities | Electric Utilities | 59.209999 | -6.135071 | 0.927260 | 10 | 3 | 29000000 | 1117000000 | 3.130 | 3.568690e+08 | 18.916933 | -6.369284 |
| 18 | ALLE | Allegion | Industrials | Building Products | 65.919998 | 13.753230 | 1.283795 | 601 | 45 | -90800000 | 153900000 | 1.600 | 9.618750e+07 | 41.199999 | -0.877453 |
| 314 | VRSN | Verisign Inc. | Information Technology | Internet Software & Services | 87.360001 | 23.459580 | 1.379480 | 35 | 127 | 37051000 | 375236000 | 3.290 | 1.140535e+08 | 26.553192 | 4.076543 |
| 59 | CF | CF Industries Holdings Inc | Materials | Fertilizers & Agricultural Chemicals | 40.810001 | -9.250611 | 2.368186 | 16 | 25 | -1710600000 | 664900000 | 2.970 | 2.238721e+08 | 13.740741 | -0.393528 |
| 131 | FLIR | FLIR Systems | Information Technology | Electronic Equipment & Instruments | 28.070000 | 0.214209 | 1.761193 | 15 | 81 | -58589000 | 241686000 | 1.730 | 1.397029e+08 | 16.225434 | 4.014713 |
| 234 | O | Realty Income Corporation | Real Estate | Retail REITs | 51.630001 | 8.420836 | 1.104581 | 4 | 47 | 36442000 | 283766000 | 1.090 | 2.603358e+08 | 47.366973 | -3.973395 |
| 120 | EXPD | Expeditors Int'l | Industrials | Air Freight & Logistics | 45.099998 | -4.449159 | 1.062553 | 27 | 94 | -119311000 | 457223000 | 2.420 | 1.889351e+08 | 18.636363 | 5.991459 |
| 303 | UHS | Universal Health Services, Inc. | Health Care | Health Care Facilities | 119.489998 | -5.136552 | 2.048697 | 16 | 6 | 29159000 | 680528000 | 6.890 | 9.877039e+07 | 17.342525 | 6.255903 |
| 159 | HRL | Hormel Foods Corp. | Consumer Staples | Packaged Foods & Meats | 39.540001 | 24.496225 | 1.078455 | 17 | 29 | 13065000 | 686088000 | 3.280 | 1.992378e+08 | 24.070121 | -1.980483 |
| 158 | HPQ | HP Inc. | Information Technology | Computer Hardware | 11.840000 | 2.161759 | 2.373359 | 16 | 18 | 2300000000 | 4554000000 | 1.800 | 5.139877e+08 | 25.309524 | 3.954975 |
| 207 | MKC | McCormick & Co. | Consumer Staples | Packaged Foods & Meats | 85.559998 | 6.976738 | 1.032221 | 24 | 9 | 35300000 | 401600000 | 3.140 | 1.278981e+08 | 27.248407 | -1.980483 |
| 321 | WHR | Whirlpool Corp. | Consumer Discretionary | Household Appliances | 146.869995 | -0.230971 | 2.397803 | 17 | 10 | -254000000 | 783000000 | 9.950 | 7.869347e+07 | 14.760804 | -45.086335 |
| 95 | DLPH | Delphi Automotive | Consumer Discretionary | Auto Parts & Equipment | 85.730003 | 12.109326 | 1.440884 | 64 | 14 | -325000000 | 1450000000 | 5.080 | 2.854331e+08 | 16.875985 | -0.662152 |
| 80 | CTSH | Cognizant Technology Solutions | Information Technology | IT Consulting & Other Services | 60.020000 | -4.654489 | 1.338123 | 17 | 182 | 115100000 | 1623600000 | 2.670 | 6.080899e+08 | 22.479401 | 7.121644 |
| 188 | LLY | Lilly (Eli) & Co. | Health Care | Pharmaceuticals | 84.260002 | 0.789478 | 1.440622 | 17 | 54 | -205200000 | 2408400000 | 2.270 | 1.060969e+09 | 37.118944 | -0.651103 |
| 332 | XRAY | Dentsply Sirona | Health Care | Health Care Supplies | 60.849998 | 19.901474 | 1.007230 | 11 | 60 | 133000000 | 251200000 | 1.790 | 1.403352e+08 | 33.994412 | 0.855096 |
| 92 | DIS | The Walt Disney Company | Consumer Discretionary | Broadcasting & Cable TV | 105.080002 | 2.049141 | 1.188454 | 19 | 26 | 848000000 | 8382000000 | 4.950 | 1.693333e+09 | 21.228283 | -3.985039 |
| 268 | RSG | Republic Services Inc | Industrials | Industrial Conglomerates | 43.990002 | 6.745943 | 0.839821 | 10 | 2 | -42800000 | 749900000 | 2.140 | 3.504206e+08 | 20.556076 | -2.428225 |
| 65 | CI | CIGNA Corp. | Health Care | Managed Health Care | 146.330002 | 8.682415 | 1.588398 | 17 | 70 | 548000000 | 2094000000 | 8.170 | 2.563035e+08 | 17.910649 | -8.805281 |
| 300 | UAA | Under Armour | Consumer Discretionary | Apparel, Accessories & Luxury Goods | 80.610001 | -16.948277 | 1.758824 | 14 | 27 | -463323000 | 232573000 | 3.030 | 2.135983e+08 | 20.819876 | -0.857290 |
| 291 | TMK | Torchmark Corp. | Financials | Life & Health Insurance | 57.160000 | 1.168142 | 1.022968 | 13 | 99 | -4636000 | 527100000 | 4.210 | 1.252019e+08 | 13.577197 | -1.883912 |
| 12 | AIZ | Assurant Inc | Financials | Multi-line Insurance | 80.540001 | 1.897773 | 1.112604 | 3 | 99 | -30351000 | 141555000 | 2.080 | 6.805529e+07 | 38.721154 | -4.072615 |
| 201 | MCD | McDonald's Corp. | Consumer Discretionary | Restaurants | 118.139999 | 19.939085 | 0.733163 | 64 | 260 | 5607600000 | 4529300000 | 4.820 | 9.396888e+08 | 24.510373 | 7.122145 |
| 27 | AN | AutoNation Inc | Consumer Discretionary | Specialty Stores | 59.660000 | 2.350316 | 1.480914 | 19 | 1 | -1300000 | 442600000 | 3.930 | 1.126209e+08 | 15.180662 | -7.970104 |
| 233 | NWL | Newell Brands | Consumer Discretionary | Housewares & Specialties | 44.080002 | 9.980039 | 1.641300 | 19 | 14 | 75400000 | 350000000 | 1.300 | 2.692308e+08 | 33.907694 | -2.075543 |
| 339 | ZTS | Zoetis | Health Care | Pharmaceuticals | 47.919998 | 16.678836 | 1.610285 | 32 | 65 | 272000000 | 339000000 | 0.680 | 4.985294e+08 | 70.470585 | 1.723068 |
| 211 | MNST | Monster Beverage | Consumer Staples | Soft Drinks | 49.653332 | 10.800357 | 1.585944 | 11 | 568 | 1805094000 | 546733000 | 3.710 | 1.469542e+09 | 25.420118 | -5.190734 |
| 29 | AON | Aon plc | Financials | Insurance Brokers | 92.209999 | 3.910301 | 1.105032 | 23 | 99 | 10000000 | 1385000000 | 4.930 | 2.809331e+08 | 18.703854 | -7.759856 |
| 91 | DHR | Danaher Corp. | Industrials | Industrial Conglomerates | 70.416985 | 8.924595 | 1.191466 | 14 | 13 | -2214800000 | 3357400000 | 4.810 | 6.980042e+08 | 14.639706 | -13.759230 |
| 236 | OMC | Omnicom Group | Consumer Discretionary | Advertising | 75.660004 | 14.810321 | 1.066369 | 45 | 18 | 217100000 | 1093900000 | 4.430 | 2.469300e+08 | 17.079008 | -10.464098 |
| 93 | DISCA | Discovery Communications-A | Consumer Discretionary | Cable & Satellite | 26.680000 | 2.026769 | 1.689235 | 19 | 25 | 23000000 | 1034000000 | -2.430 | 1.115226e+08 | 20.819876 | -76.119077 |
| 58 | CELG | Celgene Corp. | Health Care | Biotechnology | 119.760002 | 8.448793 | 2.000828 | 27 | 333 | 758700000 | 1602000000 | 2.020 | 7.930693e+08 | 59.287130 | -4.320051 |
| 132 | FLR | Fluor Corp. | Industrials | Diversified Commercial Services | 47.220001 | 10.819056 | 1.774454 | 14 | 73 | -43239000 | 412512000 | 2.850 | 1.447411e+08 | 16.568421 | 14.992623 |
| 90 | DGX | Quest Diagnostics | Health Care | Health Care Facilities | 71.139999 | 15.674795 | 1.381490 | 15 | 11 | -59000000 | 709000000 | 4.920 | 1.441057e+08 | 14.459349 | -4.552214 |
| 320 | WFC | Wells Fargo | Financials | Banks | 54.360001 | 5.532912 | 0.969774 | 12 | 99 | -460000000 | 22894000000 | 4.180 | 5.477033e+09 | 13.004785 | -0.938007 |
| 312 | VNO | Vornado Realty Trust | Real Estate | REITs | 99.959999 | 10.027519 | 1.019724 | 11 | 47 | 637230000 | 760434000 | 3.610 | 2.106465e+08 | 27.689750 | -1.081912 |
| 85 | D | Dominion Resources | Utilities | Electric Utilities | 67.639999 | -3.988642 | 0.889931 | 15 | 8 | 289000000 | 1899000000 | 3.210 | 5.915888e+08 | 21.071651 | -7.604945 |
| 73 | CNC | Centene Corporation | Health Care | Managed Health Care | 65.809998 | 21.712591 | 2.298696 | 16 | 70 | 150000000 | 355000000 | 2.990 | 1.187291e+08 | 22.010033 | -1.305493 |
| 287 | T | AT&T Inc | Telecommunications Services | Integrated Telecommunications Services | 34.410000 | 5.942118 | 0.859442 | 11 | 11 | -3482000000 | 13345000000 | 2.370 | 5.630802e+09 | 14.518987 | -23.537323 |
| 304 | UNH | United Health Group Inc. | Health Care | Managed Health Care | 117.639999 | 1.466273 | 1.482349 | 17 | 70 | 3428000000 | 5813000000 | 6.100 | 9.529508e+08 | 19.285246 | -8.805281 |
| 165 | IDXX | IDEXX Laboratories | Health Care | Health Care Equipment | 72.919998 | -1.565880 | 1.469586 | 228 | 40 | -193542000 | 192078000 | 2.070 | 9.279130e+07 | 35.227052 | -0.981083 |
| 328 | XEC | Cimarex Energy | Energy | Oil & Gas Exploration & Production | 89.379997 | -14.403372 | 2.397940 | 86 | 190 | 373520000 | -2408948000 | -25.920 | 9.293781e+07 | 93.089287 | 7.186128 |
| 167 | INTC | Intel Corp. | Information Technology | Semiconductors | 34.450001 | 14.035095 | 1.226022 | 19 | 162 | 12747000000 | 11420000000 | 2.410 | 4.738589e+09 | 14.294606 | 3.954975 |
| 14 | AKAM | Akamai Technologies Inc | Information Technology | Internet Software & Services | 52.630001 | -23.790903 | 1.384502 | 10 | 225 | 50823000 | 321406000 | 1.800 | 1.785589e+08 | 29.238889 | 4.282358 |
| 244 | PEG | Public Serv. Enterprise Inc. | Utilities | Electric Utilities | 38.689999 | -8.230553 | 1.180661 | 13 | 11 | -8000000 | 1679000000 | 3.320 | 5.057229e+08 | 11.653614 | -0.361858 |
| 105 | ED | Consolidated Edison | Utilities | Electric Utilities | 64.269997 | -3.974306 | 1.068002 | 9 | 20 | 249000000 | 1193000000 | 4.070 | 2.931204e+08 | 15.791154 | -3.022649 |
| 199 | MAS | Masco Corp. | Industrials | Building Products | 28.299999 | 11.637077 | 1.428359 | 263 | 61 | 85000000 | 355000000 | 1.030 | 3.446602e+08 | 27.475727 | 2.219577 |
| 51 | BXP | Boston Properties | Real Estate | REITs | 127.540001 | 7.203497 | 1.089469 | 10 | 47 | -1039361000 | 583106000 | 3.790 | 1.538538e+08 | 33.651715 | -1.269332 |
| 9 | AFL | AFLAC Inc | Financials | Life & Health Insurance | 59.900002 | 3.027181 | 1.048295 | 14 | 99 | -308000000 | 2533000000 | 5.880 | 4.307823e+08 | 10.187075 | -1.883912 |
| 127 | FCX | Freeport-McMoran Cp & Gld | Materials | Copper | 6.770000 | -31.685167 | 3.796410 | 155 | 5 | -240000000 | -12156000000 | -11.310 | 1.074801e+09 | 22.811951 | 2.935427 |
| 16 | ALK | Alaska Air Group Inc | Industrials | Airlines | 80.510002 | 2.066436 | 1.773431 | 35 | 74 | -34000000 | 848000000 | 6.610 | 1.282905e+08 | 12.180031 | -1.114658 |
| 0 | AAL | American Airlines Group | Industrials | Airlines | 42.349998 | 9.999995 | 1.687151 | 135 | 51 | -604000000 | 7610000000 | 11.390 | 6.681299e+08 | 3.718174 | -8.784219 |
| 284 | SWN | Southwestern Energy | Energy | Oil & Gas Exploration & Production | 7.110000 | -44.798137 | 4.580042 | 200 | 2 | -38000000 | -4556000000 | -6.070 | 4.021417e+08 | 93.089287 | 1.273530 |
| 272 | SEE | Sealed Air | Materials | Paper Packaging | 44.599998 | -5.146750 | 1.580117 | 43 | 19 | 73000000 | 225400000 | 1.630 | 1.382822e+08 | 27.361962 | -2.716908 |
| 62 | CHK | Chesapeake Energy | Energy | Integrated Oil & Gas | 4.500000 | -38.101788 | 4.559815 | 687 | 22 | -3283000000 | -14685000000 | -22.430 | 6.547035e+08 | 28.407929 | -1.840528 |
| 185 | LH | Laboratory Corp. of America Holding | Health Care | Health Care Facilities | 123.639999 | 14.174899 | 1.603130 | 9 | 42 | 136400000 | 436900000 | 5.030 | 1.782465e+08 | 15.900937 | -1.294844 |
| 169 | IPG | Interpublic Group | Consumer Discretionary | Advertising | 23.280001 | 21.821035 | 1.139799 | 23 | 20 | -157700000 | 454600000 | 1.110 | 4.095495e+08 | 20.972974 | 0.265658 |
| 192 | LUV | Southwest Airlines | Industrials | Airlines | 43.060001 | 13.855106 | 1.536290 | 30 | 41 | 301000000 | 2181000000 | 3.300 | 6.609091e+08 | 13.048485 | -5.117194 |
| 70 | CMG | Chipotle Mexican Grill | Consumer Discretionary | Restaurants | 479.850006 | -33.131268 | 2.474002 | 22 | 237 | -171460000 | 475602000 | 15.300 | 3.108510e+07 | 31.362745 | 17.201329 |
| 206 | MJN | Mead Johnson | Consumer Staples | Packaged Foods & Meats | 78.949997 | 12.081196 | 1.718403 | 103 | 136 | 403700000 | 653500000 | 3.280 | 1.992378e+08 | 24.070121 | 6.495755 |
| 111 | EQR | Equity Residential | Real Estate | REITs | 81.589996 | 8.037605 | 1.056186 | 8 | 47 | 2196000 | 870120000 | 2.370 | 3.671392e+08 | 34.426159 | -1.269332 |
| 41 | BAX | Baxter International Inc. | Health Care | Health Care Equipment | 38.150002 | 16.702365 | 1.204526 | 11 | 128 | -712000000 | 968000000 | 1.780 | 5.438202e+08 | 21.432585 | 8.637045 |
| 249 | PGR | Progressive Corp. | Financials | Property & Casualty Insurance | 31.799999 | 3.515625 | 1.086898 | 17 | 99 | 116000000 | 1267600000 | 2.160 | 5.868519e+08 | 14.722222 | -0.843313 |
| 205 | MHK | Mohawk Industries | Consumer Discretionary | Home Furnishings | 189.389999 | 3.514425 | 1.492478 | 13 | 3 | -16185000 | 615302000 | 2.590 | 2.375683e+08 | 73.123552 | -3.980316 |
| 271 | SE | Spectra Energy Corp. | Energy | Oil & Gas Refining & Marketing & Transportation | 23.940001 | -9.898378 | 2.030786 | 3 | 6 | -2000000 | 196000000 | 0.290 | 6.758621e+08 | 82.551728 | -2.580408 |
| 242 | PCG | PG&E Corp. | Utilities | MultiUtilities | 53.189999 | 0.510206 | 1.039803 | 5 | 6 | -28000000 | 888000000 | 1.810 | 4.906077e+08 | 29.386740 | -1.121059 |
| 338 | ZION | Zions Bancorp | Financials | Regional Banks | 27.299999 | -1.158588 | 1.468176 | 4 | 99 | -43623000 | 309471000 | 1.200 | 2.578925e+08 | 22.749999 | -0.063096 |
| 39 | BA | Boeing Company | Industrials | Aerospace & Defense | 144.589996 | 10.105078 | 1.155905 | 82 | 24 | -431000000 | 5176000000 | 7.520 | 6.882979e+08 | 19.227393 | 22.032612 |
| 106 | EFX | Equifax Inc. | Industrials | Research & Consulting Services | 111.370003 | 14.531063 | 1.081040 | 19 | 15 | -35000000 | 429100000 | 3.610 | 1.188643e+08 | 30.850416 | -8.116821 |
| 78 | CSX | CSX Corp. | Industrials | Railroads | 25.950001 | -4.349421 | 1.626219 | 17 | 74 | -41000000 | 1968000000 | 2.000 | 9.840000e+08 | 12.975001 | 0.902439 |
| 231 | NTRS | Northern Trust Corp. | Financials | Asset Management & Custody Banks | 72.089996 | 5.796884 | 1.281566 | 11 | 67 | 3394000000 | 973800000 | 4.030 | 2.416377e+08 | 17.888336 | -13.398380 |
| 273 | SHW | Sherwin-Williams | Materials | Specialty Chemicals | 259.600006 | 16.537983 | 1.426488 | 121 | 10 | 165012000 | 1053849000 | 11.380 | 9.260536e+07 | 22.811951 | 2.825366 |
| 322 | WM | Waste Management Inc. | Industrials | Environmental Services | 53.369999 | 7.061186 | 0.940366 | 14 | 2 | -1268000000 | 753000000 | 1.660 | 4.536145e+08 | 32.150602 | -1.415299 |
| 17 | ALL | Allstate Corp | Financials | Property & Casualty Insurance | 62.090000 | 6.592275 | 1.053266 | 11 | 99 | -162000000 | 2171000000 | 5.120 | 4.240234e+08 | 12.126953 | -4.327138 |
| 256 | PPL | PPL Corp. | Utilities | Electric Utilities | 34.130001 | 3.424245 | 1.109059 | 7 | 22 | -563000000 | 682000000 | 1.010 | 6.752475e+08 | 33.792080 | -2.827111 |
| 223 | NDAQ | NASDAQ OMX Group | Financials | Diversified Financial Services | 58.169998 | 8.810324 | 1.563258 | 8 | 117 | -126000000 | 428000000 | 2.560 | 1.671875e+08 | 22.722655 | -11.717383 |
| 88 | DE | Deere & Co. | Industrials | Construction & Farm Machinery & Heavy Trucks | 76.269997 | 3.952561 | 1.551946 | 29 | 22 | 375200000 | 1940000000 | 4.030 | 5.322359e+08 | 14.842233 | 6.277287 |
| 292 | TMO | Thermo Fisher Scientific | Health Care | Health Care Equipment | 141.850006 | 15.607180 | 1.247751 | 9 | 11 | -891400000 | 1975400000 | 4.960 | 3.982661e+08 | 28.598792 | -28.032512 |
| 150 | HCA | HCA Holdings | Health Care | Health Care Facilities | 67.629997 | -12.532337 | 1.914907 | 28 | 13 | 175000000 | 2129000000 | 5.140 | 4.142023e+08 | 13.157587 | -7.279051 |
| 42 | BBT | BB&T Corporation | Financials | Banks | 37.810001 | 5.940045 | 1.077678 | 8 | 99 | 1386000000 | 2084000000 | 2.590 | 8.046332e+08 | 14.598456 | -0.852562 |
| 180 | KMI | Kinder Morgan | Energy | Oil & Gas Refining & Marketing & Transportation | 14.920000 | -47.129693 | 3.139352 | 1 | 7 | -86000000 | 253000000 | 0.100 | 2.530000e+09 | 149.200000 | -1.894071 |
| 171 | ISRG | Intuitive Surgical Inc. | Health Care | Health Care Equipment | 546.159973 | 18.733013 | 1.126009 | 14 | 317 | 114300000 | 588800000 | 15.870 | 3.710145e+07 | 34.414617 | 42.607500 |
| 138 | GD | General Dynamics | Industrials | Aerospace & Defense | 137.360001 | -0.463767 | 0.939544 | 28 | 22 | -1603000000 | 2965000000 | 9.230 | 3.212351e+08 | 14.881907 | 4.242998 |
| 162 | HSY | The Hershey Company | Consumer Staples | Packaged Foods & Meats | 89.269997 | -3.261814 | 1.188383 | 51 | 16 | -28325000 | 512951000 | 3.280 | 1.992378e+08 | 24.070121 | -1.980483 |
| 191 | LUK | Leucadia National Corp. | Financials | Multi-Sector Holdings | 17.389999 | -14.292764 | 1.554235 | 2 | 81 | -638127000 | 252111000 | 0.740 | 3.406905e+08 | 23.499999 | 19.821416 |
| 5 | ADM | Archer-Daniels-Midland Co | Consumer Staples | Agricultural Products | 36.680000 | -12.017268 | 1.516493 | 10 | 49 | -189000000 | 1849000000 | 2.990 | 6.183946e+08 | 12.267559 | 7.496831 |
| 224 | NEE | NextEra Energy | Utilities | MultiUtilities | 103.889999 | 6.237855 | 1.023375 | 12 | 6 | -6000000 | 2752000000 | 6.110 | 4.504092e+08 | 17.003273 | -7.353314 |
| 174 | JBHT | J. B. Hunt Transport Services | Industrials | Trucking | 73.360001 | 2.961405 | 1.218373 | 33 | 1 | -395000 | 427235000 | 3.690 | 1.157818e+08 | 19.880759 | 2.823845 |
| 38 | AXP | American Express Co | Financials | Consumer Finance | 69.550003 | -6.216290 | 0.900066 | 25 | 99 | 474000000 | 5163000000 | 3.900 | 5.066604e+08 | 10.263506 | -0.609074 |
| 232 | NUE | Nucor Corp. | Materials | Steel | 40.299999 | 6.585554 | 1.460619 | 5 | 147 | 915325000 | 357659000 | 1.110 | 3.222153e+08 | 36.306305 | 11.168107 |
| 179 | KMB | Kimberly-Clark | Consumer Staples | Household Products | 127.300003 | 17.511309 | 0.870405 | 582 | 10 | -170000000 | 1013000000 | 2.780 | 3.643885e+08 | 45.791368 | -2.533011 |
| 163 | HUM | Humana Inc. | Health Care | Managed Health Care | 178.509995 | -0.145443 | 1.615206 | 12 | 70 | 636000000 | 1276000000 | 8.540 | 1.494145e+08 | 20.902810 | -8.805281 |
| 139 | GGP | General Growth Properties Inc. | Real Estate | Retail REITs | 27.209999 | 4.212937 | 1.390342 | 17 | 47 | -15576000 | 1374561000 | 3.040 | 1.582994e+08 | 47.366973 | -3.973395 |
| 67 | CL | Colgate-Palmolive | Consumer Staples | Household Products | 66.620003 | 4.781379 | 0.895471 | 463 | 27 | -119000000 | 1384000000 | 1.530 | 9.045752e+08 | 43.542486 | -0.548324 |
| 187 | LLL | L-3 Communications Holdings | Industrials | Industrial Conglomerates | 119.510002 | 14.539013 | 1.513434 | 6 | 7 | -235000000 | -240000000 | -2.970 | 8.080808e+07 | 17.334711 | 14.280750 |
| 146 | GWW | Grainger (W.W.) Inc. | Industrials | Industrial Materials | 202.589996 | -5.336199 | 1.348597 | 34 | 16 | 63492000 | 768996000 | 11.690 | 6.578238e+07 | 17.330196 | 12.112879 |
| 100 | DUK | Duke Energy | Utilities | Electric Utilities | 71.389999 | -0.833448 | 1.096727 | 7 | 8 | -1179000000 | 2816000000 | 4.050 | 6.953086e+08 | 17.627160 | -4.426811 |
| 34 | ATVI | Activision Blizzard | Information Technology | Home Entertainment Software | 38.709999 | 23.319529 | 1.886335 | 11 | 70 | -3025000000 | 892000000 | 1.210 | 7.371901e+08 | 31.991735 | 0.290291 |
| 110 | EQIX | Equinix | Real Estate | REITs | 302.399994 | 10.019650 | 1.308082 | 7 | 164 | 1617921000 | 187774000 | 3.250 | 5.777662e+07 | 93.046152 | 23.856728 |
| 221 | NAVI | Navient | Financials | Consumer Finance | 11.450000 | 1.868327 | 2.230827 | 25 | 99 | 151000000 | 997000000 | 2.660 | 3.748120e+08 | 4.304511 | -1.880943 |
| 319 | WEC | Wec Energy Group Inc | Utilities | Electric Utilities | 51.310001 | -1.986623 | 1.103033 | 7 | 2 | -12100000 | 640300000 | 2.360 | 2.713136e+08 | 21.741526 | -1.850995 |
| 19 | ALXN | Alexion Pharmaceuticals | Health Care | Biotechnology | 190.750000 | 22.338380 | 2.022921 | 2 | 195 | 66000000 | 144000000 | 0.680 | 2.117647e+08 | 280.514706 | -14.171389 |
| 212 | MO | Altria Group Inc | Consumer Staples | Tobacco | 58.209999 | 6.885788 | 0.959008 | 182 | 33 | -952000000 | 5241000000 | 2.670 | 1.962921e+09 | 21.801498 | -6.632971 |
| 44 | BHI | Baker Hughes Inc | Energy | Oil & Gas Equipment & Services | 46.150002 | -12.312367 | 2.559553 | 12 | 84 | 584000000 | -1967000000 | -4.490 | 4.380846e+08 | 93.089287 | 13.490544 |
| 208 | MLM | Martin Marietta Materials | Materials | Construction Materials | 136.580002 | -10.866015 | 2.164150 | 7 | 46 | 59758000 | 288792000 | 4.310 | 6.700510e+07 | 31.689096 | 3.050887 |
| 175 | JEC | Jacobs Engineering Group | Industrials | Industrial Conglomerates | 41.950001 | 11.539484 | 1.732990 | 7 | 23 | -271788000 | 302971000 | 2.420 | 1.251946e+08 | 17.334711 | 6.294943 |
| 186 | LKQ | LKQ Corporation | Consumer Discretionary | Distributors | 29.629999 | 4.441304 | 1.427237 | 14 | 12 | -27208000 | 423223000 | 3.030 | 2.135983e+08 | 20.819876 | -0.857290 |
| 108 | EMN | Eastman Chemical | Materials | Diversified Chemicals | 67.510002 | 3.654238 | 1.404508 | 22 | 14 | 79000000 | 848000000 | 5.710 | 1.485114e+08 | 11.823118 | -12.308821 |
| 296 | TSN | Tyson Foods | Consumer Staples | Packaged Foods & Meats | 53.330002 | 23.249369 | 1.586719 | 13 | 19 | 250000000 | 1220000000 | 3.280 | 1.992378e+08 | 24.070121 | -1.980483 |
| 79 | CTL | CenturyLink Inc | Telecommunications Services | Integrated Telecommunications Services | 25.160000 | 0.159232 | 1.522194 | 6 | 3 | -2000000 | 878000000 | 1.580 | 5.556962e+08 | 15.924051 | -13.383212 |
| 323 | WMB | Williams Cos. | Energy | Oil & Gas Exploration & Production | 25.700001 | -30.988186 | 3.719560 | 9 | 4 | -140000000 | -571000000 | -0.760 | 7.513158e+08 | 93.089287 | -14.561121 |
| 260 | PX | Praxair Inc. | Materials | Industrial Gases | 102.400002 | 0.293834 | 1.131240 | 35 | 8 | 21000000 | 1547000000 | 5.390 | 2.870130e+08 | 18.998145 | 0.574887 |
| 8 | AEP | American Electric Power | Utilities | Electric Utilities | 58.270000 | 2.371753 | 1.068485 | 11 | 9 | 13900000 | 2052300000 | 3.130 | 4.218978e+08 | 18.456543 | -3.022649 |
| 154 | HIG | Hartford Financial Svc.Gp. | Financials | Property & Casualty Insurance | 43.459999 | -5.005467 | 1.147332 | 10 | 99 | 49000000 | 1682000000 | 4.050 | 4.153086e+08 | 10.730864 | -4.327138 |
| 99 | DPS | Dr Pepper Snapple Group | Consumer Staples | Soft Drinks | 93.199997 | 18.049399 | 1.150797 | 35 | 58 | 683000000 | 764000000 | 4.000 | 1.910000e+08 | 23.299999 | -12.717277 |
| 299 | TXN | Texas Instruments | Information Technology | Semiconductors | 54.810001 | 9.971912 | 1.263479 | 30 | 126 | -199000000 | 2986000000 | 2.860 | 1.044056e+09 | 19.164336 | 2.768051 |
| 329 | XEL | Xcel Energy Inc | Utilities | MultiUtilities | 35.910000 | 1.383405 | 1.015052 | 9 | 2 | 5332000 | 984485000 | 1.940 | 5.074665e+08 | 18.510309 | -2.261927 |
| 274 | SLG | SL Green Realty | Real Estate | Office REITs | 112.980003 | 4.004424 | 1.091967 | 4 | 47 | -26010000 | 284084000 | 1.020 | 2.785137e+08 | 110.764709 | -3.089477 |
| 28 | ANTM | Anthem Inc. | Health Care | Managed Health Care | 139.440002 | -0.620053 | 1.511654 | 11 | 70 | -38200000 | 2560000000 | 9.730 | 2.631038e+08 | 14.330935 | -31.006773 |
| 31 | APC | Anadarko Petroleum Corp | Energy | Oil & Gas Exploration & Production | 48.580002 | -20.802083 | 2.435165 | 52 | 22 | -6430000000 | -6692000000 | -13.180 | 5.077390e+08 | 93.089287 | -12.860938 |
| 55 | CBG | CBRE Group | Real Estate | Real Estate Services | 34.580002 | 8.197757 | 1.297857 | 20 | 12 | -200481000 | 547132000 | 1.640 | 3.336171e+08 | 21.085367 | -3.415302 |
| 32 | APH | Amphenol Corp | Information Technology | Electronic Components | 52.230000 | 2.693667 | 1.007762 | 24 | 175 | 768300000 | 763500000 | 2.470 | 3.091093e+08 | 21.145749 | 8.202923 |
| 213 | MOS | The Mosaic Company | Materials | Fertilizers & Agricultural Chemicals | 27.590000 | -11.229086 | 2.830675 | 10 | 62 | -1098300000 | 1000400000 | 2.790 | 3.585663e+08 | 9.888889 | 5.846617 |
| 48 | BMY | Bristol-Myers Squibb | Health Care | Health Care Distributors | 68.790001 | 16.081680 | 1.498872 | 11 | 53 | -3186000000 | 1565000000 | 0.940 | 1.664894e+09 | 73.180852 | 0.588026 |
| 283 | SWKS | Skyworks Solutions | Information Technology | Semiconductors | 76.830002 | -8.513933 | 2.017394 | 25 | 225 | 237800000 | 798300000 | 4.210 | 1.896200e+08 | 18.249407 | 7.413777 |
| 293 | TRIP | TripAdvisor | Consumer Discretionary | Internet & Direct Marketing Retail | 85.250000 | 34.803917 | 1.578344 | 14 | 212 | 159000000 | 198000000 | 1.380 | 1.434783e+08 | 61.775362 | 2.627576 |
| 33 | ARNC | Arconic Inc | Industrials | Aerospace & Defense | 7.398807 | 1.647784 | 2.592065 | 3 | 37 | 42000000 | -322000000 | -0.310 | 1.038710e+09 | 18.687607 | 2.639814 |
| 35 | AVB | AvalonBay Communities, Inc. | Real Estate | Residential REITs | 184.130005 | 4.857630 | 1.132875 | 8 | 47 | -108953000 | 741733000 | 5.540 | 1.338868e+08 | 33.236463 | -3.089477 |
| 269 | SCG | SCANA Corp | Utilities | MultiUtilities | 60.490002 | 7.232764 | 1.266240 | 14 | 9 | 39000000 | 746000000 | 5.220 | 1.429119e+08 | 11.588123 | -4.016461 |
| 63 | CHRW | C. H. Robinson Worldwide | Industrials | Air Freight & Logistics | 62.020000 | -9.008221 | 1.185473 | 44 | 12 | 39289000 | 509699000 | 3.520 | 1.448009e+08 | 17.619318 | 1.117804 |
| 157 | HPE | Hewlett Packard Enterprise | Information Technology | Technology Hardware, Storage & Peripherals | 15.200000 | -17.837838 | 3.400491 | 7 | 45 | 7523000000 | 2461000000 | 1.800 | 5.139877e+08 | 25.309524 | 3.954975 |
| 128 | FE | FirstEnergy Corp | Utilities | Electric Utilities | 31.730000 | 1.179844 | 1.238785 | 5 | 2 | 46000000 | 578000000 | 1.370 | 4.218978e+08 | 23.160584 | -6.072561 |
| 46 | BK | The Bank of New York Mellon Corp. | Financials | Banks | 41.220001 | 5.422003 | 1.201660 | 8 | 99 | -433000000 | 3158000000 | 2.730 | 1.156777e+09 | 15.098901 | -3.321298 |
| 66 | CINF | Cincinnati Financial | Financials | Property & Casualty Insurance | 59.169998 | 9.777358 | 0.935812 | 10 | 99 | -47000000 | 634000000 | 3.870 | 1.638243e+08 | 15.289405 | -4.327138 |
| 189 | LMT | Lockheed Martin Corp. | Industrials | Aerospace & Defense | 217.149994 | 5.254227 | 0.903098 | 116 | 8 | -356000000 | 3605000000 | 11.620 | 3.102410e+08 | 18.687607 | -10.852854 |
| 172 | ITW | Illinois Tool Works | Industrials | Industrial Machinery | 92.680000 | 12.776831 | 1.142869 | 36 | 130 | -900000000 | 1899000000 | 5.160 | 3.680233e+08 | 17.961240 | 7.586477 |
| 324 | WU | Western Union Co | Information Technology | Internet Software & Services | 17.910000 | -2.610109 | 1.273051 | 60 | 16 | -467300000 | 837800000 | 1.630 | 5.139877e+08 | 10.987730 | -8.043772 |
| 333 | XRX | Xerox Corp. | Information Technology | IT Consulting & Other Services | 10.630000 | 9.474768 | 1.866680 | 5 | 26 | -43000000 | 474000000 | 0.420 | 1.128571e+09 | 25.309524 | -0.295949 |
| 327 | WYNN | Wynn Resorts Ltd | Consumer Discretionary | Casinos & Gaming | 69.190002 | 29.496541 | 3.794783 | 174 | 198 | -102075000 | 195290000 | 1.930 | 1.011865e+08 | 35.849742 | 12.695712 |
| 325 | WY | Weyerhaeuser Corp. | Real Estate | REITs | 29.980000 | 8.544529 | 1.338067 | 10 | 116 | -568000000 | 506000000 | 0.890 | 5.685393e+08 | 33.685393 | 2.284802 |
| 168 | IP | International Paper | Materials | Paper Packaging | 37.700001 | -0.026513 | 1.301630 | 24 | 27 | -831000000 | 938000000 | 2.250 | 4.168889e+08 | 16.755556 | 6.123934 |
| 47 | BLL | Ball Corp | Materials | Metal & Glass Containers | 72.730003 | 16.535816 | 1.386684 | 22 | 10 | 32600000 | 280900000 | 2.050 | 1.370244e+08 | 35.478050 | -3.895657 |
| 113 | ES | Eversource Energy | Utilities | MultiUtilities | 51.070000 | 0.709921 | 1.232829 | 8 | 1 | -14756000 | 878485000 | 2.770 | 3.171426e+08 | 18.436823 | -1.169833 |
| 261 | PYPL | PayPal | Information Technology | Data Processing & Outsourced Services | 36.200001 | 17.456201 | 1.925754 | 9 | 25 | -808000000 | 1228000000 | 1.000 | 1.228000e+09 | 36.200001 | 5.434039 |
| 40 | BAC | Bank of America Corp | Financials | Banks | 16.830000 | 8.440722 | 1.418688 | 6 | 99 | 20764000000 | 15888000000 | 4.180 | 8.450695e+08 | 13.004785 | -0.938007 |
| 21 | AME | AMETEK Inc | Industrials | Electrical Components & Equipment | 53.590000 | 2.212474 | 1.089266 | 18 | 37 | 3390000 | 590859000 | 2.460 | 2.401866e+08 | 21.784553 | -4.490342 |
| 101 | DVA | DaVita Inc. | Health Care | Health Care Facilities | 69.709999 | -3.622291 | 1.211643 | 6 | 79 | 533875000 | 269732000 | 1.270 | 2.123874e+08 | 54.889763 | 1.962527 |
| 164 | IBM | International Business Machines | Information Technology | IT Consulting & Other Services | 137.619995 | -5.292136 | 1.082881 | 92 | 24 | -790000000 | 13190000000 | 13.480 | 9.784866e+08 | 10.209198 | 4.852391 |
| 69 | CME | CME Group Inc. | Financials | Financial Exchanges & Data | 90.599998 | -2.402245 | 1.323348 | 6 | 99 | 326500000 | 1247000000 | 3.710 | 3.361186e+08 | 24.420485 | -58.649536 |
| 53 | CAT | Caterpillar Inc. | Industrials | Construction & Farm Machinery & Heavy Trucks | 67.959999 | 3.550209 | 1.493553 | 17 | 25 | -881000000 | 2512000000 | 3.540 | 7.096045e+08 | 19.197740 | 6.264053 |
| 137 | FTR | Frontier Communications | Telecommunications Services | Integrated Telecommunications Services | 4.670000 | -2.301255 | 2.026818 | 3 | 496 | 254000000 | -196000000 | -0.290 | 6.758621e+08 | 14.518987 | 10.497704 |
| 24 | AMP | Ameriprise Financial | Financials | Asset Management & Custody Banks | 106.419998 | -2.420686 | 1.222260 | 22 | 67 | -281000000 | 1562000000 | 8.600 | 1.816279e+08 | 12.374418 | -13.398380 |
| 278 | SPGI | S&P Global, Inc. | Financials | Diversified Financial Services | 98.580002 | 14.044424 | 1.080858 | 596 | 51 | -1016000000 | 1156000000 | 4.260 | 2.713615e+08 | 23.140846 | -4.178927 |
| 184 | LEN | Lennar Corp. | Consumer Discretionary | Homebuilding | 48.910000 | 1.705136 | 1.569167 | 14 | 25 | -123369000 | 802894000 | 3.870 | 2.074661e+08 | 12.638243 | -0.307832 |
| 134 | FMC | FMC Corporation | Materials | Diversified Chemicals | 39.130001 | 15.088238 | 2.175738 | 26 | 5 | -30900000 | 489000000 | 3.660 | 1.336066e+08 | 10.691257 | 5.101546 |
| 258 | PSX | Phillips 66 | Energy | Oil & Gas Refining & Marketing & Transportation | 81.800003 | 5.371643 | 1.379589 | 18 | 41 | -2133000000 | 4227000000 | 7.780 | 5.433162e+08 | 10.514139 | 7.029056 |
| 116 | ETN | Eaton Corporation | Industrials | Industrial Conglomerates | 52.040001 | 1.166411 | 1.521430 | 13 | 10 | -513000000 | 1979000000 | 4.250 | 4.656471e+08 | 12.244706 | -8.639591 |
| 204 | MET | MetLife Inc. | Financials | Life & Health Insurance | 48.209999 | 1.366690 | 1.138650 | 8 | 99 | 1944000000 | 5310000000 | 4.610 | 1.151844e+09 | 10.457700 | -1.883912 |
| 142 | GM | General Motors | Consumer Discretionary | Automobile Manufacturers | 34.009998 | 12.281271 | 1.344514 | 24 | 33 | -3857000000 | 9687000000 | 6.110 | 1.585434e+09 | 5.566284 | -4.892037 |
| 294 | TRV | The Travelers Companies Inc. | Financials | Property & Casualty Insurance | 112.860001 | 13.029548 | 0.959365 | 15 | 99 | 6000000 | 3439000000 | 10.990 | 3.129208e+08 | 10.269336 | -0.891599 |
| 217 | MTB | M&T Bank Corp. | Financials | Banks | 121.180000 | -0.361785 | 1.380390 | 7 | 99 | -5317000 | 1079667000 | 7.220 | 1.495384e+08 | 16.783934 | -0.938007 |
| 263 | RCL | Royal Caribbean Cruises Ltd | Consumer Discretionary | Hotels, Resorts & Cruise Lines | 101.209999 | 13.425973 | 1.556512 | 8 | 9 | -67676000 | 665783000 | 3.030 | 2.197304e+08 | 33.402640 | -15.727481 |
| 56 | CCI | Crown Castle International Corp. | Real Estate | REITs | 86.449997 | 9.569068 | 0.960191 | 21 | 36 | 3190000 | 1520992000 | 4.440 | 3.425658e+08 | 19.470720 | -10.666679 |
| 61 | CHD | Church & Dwight | Consumer Staples | Household Products | 42.439999 | 1.047615 | 0.929026 | 20 | 38 | -93000000 | 410400000 | 3.130 | 1.311182e+08 | 13.559105 | -9.428134 |
| 270 | SCHW | Charles Schwab Corporation | Financials | Investment Banking & Brokerage | 32.930000 | 15.462833 | 1.456940 | 11 | 99 | 615000000 | 1447000000 | 1.040 | 1.391346e+09 | 31.663462 | -0.130090 |
| 225 | NEM | Newmont Mining Corp. (Hldg. Co.) | Materials | Gold | 17.990000 | 10.844116 | 2.536050 | 2 | 198 | 379000000 | 220000000 | 0.430 | 5.116279e+08 | 41.837209 | 6.971864 |
| 222 | NBL | Noble Energy Inc | Energy | Oil & Gas Exploration & Production | 32.930000 | 7.298791 | 2.509437 | 24 | 57 | -155000000 | -2441000000 | -6.070 | 4.021417e+08 | 93.089287 | 1.171229 |
# check number of rows and columns
data.shape
(340, 15)
#check for missing data
data.isna().sum().sort_values(ascending=False)
Ticker Symbol 0 Security 0 GICS Sector 0 GICS Sub Industry 0 Current Price 0 Price Change 0 Volatility 0 ROE 0 Cash Ratio 0 Net Cash Flow 0 Net Income 0 Earnings Per Share 0 Estimated Shares Outstanding 0 P/E Ratio 0 P/B Ratio 0 dtype: int64
data[data.duplicated()].count()
Ticker Symbol 0 Security 0 GICS Sector 0 GICS Sub Industry 0 Current Price 0 Price Change 0 Volatility 0 ROE 0 Cash Ratio 0 Net Cash Flow 0 Net Income 0 Earnings Per Share 0 Estimated Shares Outstanding 0 P/E Ratio 0 P/B Ratio 0 dtype: int64
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 340 entries, 0 to 339 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Ticker Symbol 340 non-null object 1 Security 340 non-null object 2 GICS Sector 340 non-null object 3 GICS Sub Industry 340 non-null object 4 Current Price 340 non-null float64 5 Price Change 340 non-null float64 6 Volatility 340 non-null float64 7 ROE 340 non-null int64 8 Cash Ratio 340 non-null int64 9 Net Cash Flow 340 non-null int64 10 Net Income 340 non-null int64 11 Earnings Per Share 340 non-null float64 12 Estimated Shares Outstanding 340 non-null float64 13 P/E Ratio 340 non-null float64 14 P/B Ratio 340 non-null float64 dtypes: float64(7), int64(4), object(4) memory usage: 40.0+ KB
# get the column names that are object type
object_columns = list(data.select_dtypes(include=['object']).columns)
object_columns
['Ticker Symbol', 'Security', 'GICS Sector', 'GICS Sub Industry']
data.describe(include=['object']).T
| count | unique | top | freq | |
|---|---|---|---|---|
| Ticker Symbol | 340 | 340 | UNM | 1 |
| Security | 340 | 340 | EOG Resources | 1 |
| GICS Sector | 340 | 11 | Industrials | 53 |
| GICS Sub Industry | 340 | 104 | Oil & Gas Exploration & Production | 16 |
data.groupby(['GICS Sector']).size().sort_values(ascending=False).reset_index()
| GICS Sector | 0 | |
|---|---|---|
| 0 | Industrials | 53 |
| 1 | Financials | 49 |
| 2 | Consumer Discretionary | 40 |
| 3 | Health Care | 40 |
| 4 | Information Technology | 33 |
| 5 | Energy | 30 |
| 6 | Real Estate | 27 |
| 7 | Utilities | 24 |
| 8 | Materials | 20 |
| 9 | Consumer Staples | 19 |
| 10 | Telecommunications Services | 5 |
data.groupby(['GICS Sector','GICS Sub Industry']).size().sort_values(ascending=False).reset_index()
| GICS Sector | GICS Sub Industry | 0 | |
|---|---|---|---|
| 0 | Energy | Oil & Gas Exploration & Production | 16 |
| 1 | Real Estate | REITs | 14 |
| 2 | Industrials | Industrial Conglomerates | 14 |
| 3 | Information Technology | Internet Software & Services | 12 |
| 4 | Utilities | Electric Utilities | 12 |
| 5 | Health Care | Health Care Equipment | 11 |
| 6 | Utilities | MultiUtilities | 11 |
| 7 | Financials | Banks | 10 |
| 8 | Financials | Property & Casualty Insurance | 8 |
| 9 | Financials | Diversified Financial Services | 7 |
| 10 | Health Care | Biotechnology | 7 |
| 11 | Energy | Oil & Gas Refining & Marketing & Transportation | 6 |
| 12 | Consumer Staples | Packaged Foods & Meats | 6 |
| 13 | Information Technology | Semiconductors | 6 |
| 14 | Health Care | Pharmaceuticals | 6 |
| 15 | Health Care | Managed Health Care | 5 |
| 16 | Industrials | Airlines | 5 |
| 17 | Financials | Consumer Finance | 5 |
| 18 | Materials | Diversified Chemicals | 5 |
| 19 | Energy | Integrated Oil & Gas | 5 |
| 20 | Health Care | Health Care Facilities | 5 |
| 21 | Industrials | Industrial Machinery | 5 |
| 22 | Real Estate | Residential REITs | 4 |
| 23 | Financials | Asset Management & Custody Banks | 4 |
| 24 | Materials | Specialty Chemicals | 4 |
| 25 | Industrials | Research & Consulting Services | 4 |
| 26 | Consumer Discretionary | Hotels, Resorts & Cruise Lines | 4 |
| 27 | Telecommunications Services | Integrated Telecommunications Services | 4 |
| 28 | Consumer Staples | Soft Drinks | 4 |
| 29 | Industrials | Building Products | 4 |
| 30 | Industrials | Aerospace & Defense | 4 |
| 31 | Real Estate | Retail REITs | 4 |
| 32 | Industrials | Railroads | 4 |
| 33 | Consumer Discretionary | Internet & Direct Marketing Retail | 4 |
| 34 | Consumer Discretionary | Specialty Stores | 3 |
| 35 | Consumer Discretionary | Cable & Satellite | 3 |
| 36 | Financials | Regional Banks | 3 |
| 37 | Information Technology | IT Consulting & Other Services | 3 |
| 38 | Health Care | Health Care Distributors | 3 |
| 39 | Financials | Life & Health Insurance | 3 |
| 40 | Real Estate | Specialized REITs | 3 |
| 41 | Financials | Insurance Brokers | 3 |
| 42 | Industrials | Air Freight & Logistics | 3 |
| 43 | Consumer Discretionary | Restaurants | 3 |
| 44 | Energy | Oil & Gas Equipment & Services | 3 |
| 45 | Industrials | Construction & Farm Machinery & Heavy Trucks | 3 |
| 46 | Consumer Staples | Household Products | 3 |
| 47 | Health Care | Health Care Supplies | 2 |
| 48 | Information Technology | Application Software | 2 |
| 49 | Materials | Construction Materials | 2 |
| 50 | Information Technology | Data Processing & Outsourced Services | 2 |
| 51 | Consumer Discretionary | Advertising | 2 |
| 52 | Information Technology | Electronic Components | 2 |
| 53 | Consumer Discretionary | Auto Parts & Equipment | 2 |
| 54 | Financials | Investment Banking & Brokerage | 2 |
| 55 | Consumer Discretionary | Automobile Manufacturers | 2 |
| 56 | Consumer Staples | Tobacco | 2 |
| 57 | Consumer Discretionary | Broadcasting & Cable TV | 2 |
| 58 | Materials | Paper Packaging | 2 |
| 59 | Consumer Discretionary | Homebuilding | 2 |
| 60 | Materials | Fertilizers & Agricultural Chemicals | 2 |
| 61 | Consumer Discretionary | Leisure Products | 2 |
| 62 | Information Technology | Networking Equipment | 1 |
| 63 | Information Technology | Home Entertainment Software | 1 |
| 64 | Telecommunications Services | Alternative Carriers | 1 |
| 65 | Materials | Copper | 1 |
| 66 | Real Estate | Real Estate Services | 1 |
| 67 | Real Estate | Office REITs | 1 |
| 68 | Materials | Steel | 1 |
| 69 | Information Technology | Technology Hardware, Storage & Peripherals | 1 |
| 70 | Materials | Metal & Glass Containers | 1 |
| 71 | Information Technology | Electronic Equipment & Instruments | 1 |
| 72 | Materials | Industrial Gases | 1 |
| 73 | Materials | Gold | 1 |
| 74 | Information Technology | Semiconductor Equipment | 1 |
| 75 | Health Care | Life Sciences Tools & Services | 1 |
| 76 | Information Technology | Computer Hardware | 1 |
| 77 | Industrials | Trucking | 1 |
| 78 | Consumer Discretionary | Casinos & Gaming | 1 |
| 79 | Consumer Discretionary | Consumer Electronics | 1 |
| 80 | Consumer Discretionary | Distributors | 1 |
| 81 | Consumer Discretionary | Home Furnishings | 1 |
| 82 | Consumer Discretionary | Household Appliances | 1 |
| 83 | Consumer Discretionary | Housewares & Specialties | 1 |
| 84 | Consumer Discretionary | Motorcycle Manufacturers | 1 |
| 85 | Consumer Discretionary | Publishing | 1 |
| 86 | Consumer Discretionary | Specialty Retail | 1 |
| 87 | Consumer Discretionary | Tires & Rubber | 1 |
| 88 | Consumer Staples | Agricultural Products | 1 |
| 89 | Consumer Staples | Brewers | 1 |
| 90 | Consumer Staples | Drug Retail | 1 |
| 91 | Consumer Staples | Personal Products | 1 |
| 92 | Financials | Financial Exchanges & Data | 1 |
| 93 | Financials | Multi-Sector Holdings | 1 |
| 94 | Financials | Multi-line Insurance | 1 |
| 95 | Financials | Thrifts & Mortgage Finance | 1 |
| 96 | Consumer Discretionary | Apparel, Accessories & Luxury Goods | 1 |
| 97 | Industrials | Diversified Commercial Services | 1 |
| 98 | Industrials | Electrical Components & Equipment | 1 |
| 99 | Industrials | Environmental Services | 1 |
| 100 | Industrials | Human Resource & Employment Services | 1 |
| 101 | Industrials | Industrial Materials | 1 |
| 102 | Industrials | Technology, Hardware, Software and Supplies | 1 |
| 103 | Utilities | Water Utilities | 1 |
# fixing column names
data.columns = [c.replace("/", "_") for c in data.columns]
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 340 entries, 0 to 339 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Ticker Symbol 340 non-null object 1 Security 340 non-null object 2 GICS Sector 340 non-null object 3 GICS Sub Industry 340 non-null object 4 Current Price 340 non-null float64 5 Price Change 340 non-null float64 6 Volatility 340 non-null float64 7 ROE 340 non-null int64 8 Cash Ratio 340 non-null int64 9 Net Cash Flow 340 non-null int64 10 Net Income 340 non-null int64 11 Earnings Per Share 340 non-null float64 12 Estimated Shares Outstanding 340 non-null float64 13 P_E Ratio 340 non-null float64 14 P_B Ratio 340 non-null float64 dtypes: float64(7), int64(4), object(4) memory usage: 40.0+ KB
# Let's look at the statistical summary of the data
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Current Price | 340.0 | 8.086234e+01 | 9.805509e+01 | 4.500000e+00 | 3.855500e+01 | 5.970500e+01 | 9.288000e+01 | 1.274950e+03 |
| Price Change | 340.0 | 4.078194e+00 | 1.200634e+01 | -4.712969e+01 | -9.394838e-01 | 4.819505e+00 | 1.069549e+01 | 5.505168e+01 |
| Volatility | 340.0 | 1.525976e+00 | 5.917984e-01 | 7.331632e-01 | 1.134878e+00 | 1.385593e+00 | 1.695549e+00 | 4.580042e+00 |
| ROE | 340.0 | 3.959706e+01 | 9.654754e+01 | 1.000000e+00 | 9.750000e+00 | 1.500000e+01 | 2.700000e+01 | 9.170000e+02 |
| Cash Ratio | 340.0 | 7.002353e+01 | 9.042133e+01 | 0.000000e+00 | 1.800000e+01 | 4.700000e+01 | 9.900000e+01 | 9.580000e+02 |
| Net Cash Flow | 340.0 | 5.553762e+07 | 1.946365e+09 | -1.120800e+10 | -1.939065e+08 | 2.098000e+06 | 1.698108e+08 | 2.076400e+10 |
| Net Income | 340.0 | 1.494385e+09 | 3.940150e+09 | -2.352800e+10 | 3.523012e+08 | 7.073360e+08 | 1.899000e+09 | 2.444200e+10 |
| Earnings Per Share | 340.0 | 2.776662e+00 | 6.587779e+00 | -6.120000e+01 | 1.557500e+00 | 2.895000e+00 | 4.620000e+00 | 5.009000e+01 |
| Estimated Shares Outstanding | 340.0 | 5.770283e+08 | 8.458496e+08 | 2.767216e+07 | 1.588482e+08 | 3.096751e+08 | 5.731175e+08 | 6.159292e+09 |
| P_E Ratio | 340.0 | 3.261256e+01 | 4.434873e+01 | 2.935451e+00 | 1.504465e+01 | 2.081988e+01 | 3.176476e+01 | 5.280391e+02 |
| P_B Ratio | 340.0 | -1.718249e+00 | 1.396691e+01 | -7.611908e+01 | -4.352056e+00 | -1.067170e+00 | 3.917066e+00 | 1.290646e+02 |
#function to plot a boxplot and a histogram along the same scale.
# import the library for labelling
import matplotlib.patheffects as path_effects
# import the library for labelling
import matplotlib.patheffects as path_effects
def add_median_labels(ax):
lines = ax.get_lines()
# determine number of lines per box (this varies with/without fliers)
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
# iterate over median lines
for median in lines[4:len(lines):lines_per_box]:
# display median value at center of median line
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1]-median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:.1f}', ha='center', va='center',
fontweight='bold', color='white', bbox=dict(facecolor='black'),size=15)
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
def box_and_histogram(column, figsize=(10,10), bins = None):
""" Boxplot and histogram together, with median labels on boxplot
df_series: dataframe column
figsize: size of fig (default (9,8))
bins: number of bins (default None / auto)
color of mean is green and median is black
"""
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)},
figsize = figsize
) # creating the 2 subplots
box_plot = sns.boxplot(column, ax=ax_box2,showmeans=True, color='red')
add_median_labels(box_plot.axes)
sns.distplot(column, kde=F, bins=bins) if bins else sns.distplot(column, kde=True, ) # For histogram
ax_hist2.axvline(np.mean(column), color='g', linestyle='--') # Add mean to the histogram
ax_hist2.axvline(np.median(column), color='black', linestyle='-') # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="RdYlBu",
order=data[feature].value_counts().index
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# let's explore discounts further
labeled_barplot(data, "GICS Sector", perc=True)
# Let's visualize the data for ['Current Price']
columns = ['Current Price']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Price Change']
columns = ['Price Change']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Volatility']
columns = ['Volatility']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['ROE']
columns = ['ROE']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Cash Ratio']
columns = ['Cash Ratio']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Net Cash Flow']
columns = ['Net Cash Flow']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Net Income']
columns = ['Net Income']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Earnings Per Share']
columns = ['Earnings Per Share']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['Estimated Shares Outstanding']
columns = ['Estimated Shares Outstanding']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['P_E Ratio']
columns = ['P_E Ratio']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['P_B Ratio']
columns = ['P_B Ratio']
for col in columns:
box_and_histogram(data[col])
# selecting numerical columns
num_col = data.select_dtypes(include=np.number).columns.tolist()
num_col
['Current Price', 'Price Change', 'Volatility', 'ROE', 'Cash Ratio', 'Net Cash Flow', 'Net Income', 'Earnings Per Share', 'Estimated Shares Outstanding', 'P_E Ratio', 'P_B Ratio']
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
for item in num_col:
histogram_boxplot(data, item)
sns.pairplot(data=data[num_col], diag_kind="kde")
plt.show()
fig, axes = plt.subplots(6, 2, figsize=(20, 15))
fig.suptitle("CDF plot of numerical variables", fontsize=20)
counter = 0
for ii in range(6):
sns.ecdfplot(ax=axes[ii][0], x=data[num_col[counter]])
counter = counter + 1
if counter != 11:
sns.ecdfplot(ax=axes[ii][1], x=data[num_col[counter]])
counter = counter + 1
else:
pass
fig.tight_layout(pad=2.0)
Questions:
columns = ['Price Change']
for col in columns:
box_and_histogram(data[col])
vol_mean = data.groupby('GICS Sector')['Volatility'].agg(['mean', 'count']).sort_values(by='mean',ascending=False)
vol_mean
| mean | count | |
|---|---|---|
| GICS Sector | ||
| Energy | 2.568777 | 30 |
| Materials | 1.816726 | 20 |
| Information Technology | 1.659801 | 33 |
| Consumer Discretionary | 1.595478 | 40 |
| Health Care | 1.541023 | 40 |
| Industrials | 1.416989 | 53 |
| Telecommunications Services | 1.341612 | 5 |
| Financials | 1.267255 | 49 |
| Real Estate | 1.206053 | 27 |
| Consumer Staples | 1.152675 | 19 |
| Utilities | 1.118018 | 24 |
vol_results =vol_mean.reset_index()
vol_results
| GICS Sector | mean | count | |
|---|---|---|---|
| 0 | Energy | 2.568777 | 30 |
| 1 | Materials | 1.816726 | 20 |
| 2 | Information Technology | 1.659801 | 33 |
| 3 | Consumer Discretionary | 1.595478 | 40 |
| 4 | Health Care | 1.541023 | 40 |
| 5 | Industrials | 1.416989 | 53 |
| 6 | Telecommunications Services | 1.341612 | 5 |
| 7 | Financials | 1.267255 | 49 |
| 8 | Real Estate | 1.206053 | 27 |
| 9 | Consumer Staples | 1.152675 | 19 |
| 10 | Utilities | 1.118018 | 24 |
sns.set(rc={'figure.figsize':(21,7)})
sns.catplot(x="GICS Sector", y="mean", kind="bar", data=vol_results, height=7, aspect=3);
plt.figure(figsize=(15, 7))
sns.heatmap(data[num_col].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
cash_ratio_mean = data.groupby('GICS Sector')['Cash Ratio'].agg(['mean', 'count']).sort_values(by='mean',ascending=False)
cash_ratio_mean
| mean | count | |
|---|---|---|
| GICS Sector | ||
| Information Technology | 149.818182 | 33 |
| Telecommunications Services | 117.000000 | 5 |
| Health Care | 103.775000 | 40 |
| Financials | 98.591837 | 49 |
| Consumer Staples | 70.947368 | 19 |
| Energy | 51.133333 | 30 |
| Real Estate | 50.111111 | 27 |
| Consumer Discretionary | 49.575000 | 40 |
| Materials | 41.700000 | 20 |
| Industrials | 36.188679 | 53 |
| Utilities | 13.625000 | 24 |
cash_ratio_results = cash_ratio_mean.reset_index()
cash_ratio_results
| GICS Sector | mean | count | |
|---|---|---|---|
| 0 | Information Technology | 149.818182 | 33 |
| 1 | Telecommunications Services | 117.000000 | 5 |
| 2 | Health Care | 103.775000 | 40 |
| 3 | Financials | 98.591837 | 49 |
| 4 | Consumer Staples | 70.947368 | 19 |
| 5 | Energy | 51.133333 | 30 |
| 6 | Real Estate | 50.111111 | 27 |
| 7 | Consumer Discretionary | 49.575000 | 40 |
| 8 | Materials | 41.700000 | 20 |
| 9 | Industrials | 36.188679 | 53 |
| 10 | Utilities | 13.625000 | 24 |
sns.set(rc={'figure.figsize':(21,7)})
sns.catplot(x="GICS Sector", y="mean", kind="bar", data=cash_ratio_results, height=7, aspect=3);
P_E_mean = data.groupby('GICS Sector')['P_E Ratio'].agg(['mean', 'count']).sort_values(by='mean',ascending=False)
P_E_mean
| mean | count | |
|---|---|---|
| GICS Sector | ||
| Energy | 72.897709 | 30 |
| Information Technology | 43.782546 | 33 |
| Real Estate | 43.065585 | 27 |
| Health Care | 41.135272 | 40 |
| Consumer Discretionary | 35.211613 | 40 |
| Consumer Staples | 25.521195 | 19 |
| Materials | 24.585352 | 20 |
| Utilities | 18.719412 | 24 |
| Industrials | 18.259380 | 53 |
| Financials | 16.023151 | 49 |
| Telecommunications Services | 12.222578 | 5 |
P_E_results = P_E_mean.reset_index()
P_E_results
| GICS Sector | mean | count | |
|---|---|---|---|
| 0 | Energy | 72.897709 | 30 |
| 1 | Information Technology | 43.782546 | 33 |
| 2 | Real Estate | 43.065585 | 27 |
| 3 | Health Care | 41.135272 | 40 |
| 4 | Consumer Discretionary | 35.211613 | 40 |
| 5 | Consumer Staples | 25.521195 | 19 |
| 6 | Materials | 24.585352 | 20 |
| 7 | Utilities | 18.719412 | 24 |
| 8 | Industrials | 18.259380 | 53 |
| 9 | Financials | 16.023151 | 49 |
| 10 | Telecommunications Services | 12.222578 | 5 |
sns.set(rc={'figure.figsize':(21,7)})
sns.catplot(x="GICS Sector", y="mean", kind="bar", data=P_E_results, height=7, aspect=3);
# scaling the dataset before clustering
scaler = StandardScaler()
subset = data[num_col].copy()
subset_scaled = scaler.fit_transform(subset)
# creating a dataframe of the scaled columns
subset_scaled_df = pd.DataFrame(subset_scaled, columns=subset.columns)
for item in num_col:
histogram_boxplot(subset_scaled_df, item)
clusters = range(1, 12)
meanDistortions = []
for k in clusters:
model = KMeans(n_clusters=k)
model.fit(subset_scaled_df)
prediction = model.predict(subset_scaled_df)
distortion = (
sum(
np.min(cdist(subset_scaled_df, model.cluster_centers_, "euclidean"), axis=1)
)
/ subset_scaled_df.shape[0]
)
meanDistortions.append(distortion)
print("Number of Clusters:", k, "\tAverage Distortion:", distortion)
plt.plot(clusters, meanDistortions, "bx-")
plt.xlabel("k")
plt.ylabel("Average Distortion")
plt.title("Selecting k with the Elbow Method", fontsize=20)
Number of Clusters: 1 Average Distortion: 2.5425069919221697 Number of Clusters: 2 Average Distortion: 2.382318498894466 Number of Clusters: 3 Average Distortion: 2.2692367155390745 Number of Clusters: 4 Average Distortion: 2.176396791566185 Number of Clusters: 5 Average Distortion: 2.128799332840716 Number of Clusters: 6 Average Distortion: 2.0591416288820374 Number of Clusters: 7 Average Distortion: 1.9826333396712665 Number of Clusters: 8 Average Distortion: 1.9753526418461937 Number of Clusters: 9 Average Distortion: 1.8964970616244075 Number of Clusters: 10 Average Distortion: 1.8539123989265462 Number of Clusters: 11 Average Distortion: 1.7997730037404913
Text(0.5, 1.0, 'Selecting k with the Elbow Method')
sil_score = []
cluster_list = list(range(2, 15))
for n_clusters in cluster_list:
clusterer = KMeans(n_clusters=n_clusters)
preds = clusterer.fit_predict((subset_scaled_df))
# centers = clusterer.cluster_centers_
score = silhouette_score(subset_scaled_df, preds)
sil_score.append(score)
print("For n_clusters = {}, the silhouette score is {})".format(n_clusters, score))
plt.plot(cluster_list, sil_score)
plt.show()
For n_clusters = 2, the silhouette score is 0.43969639509980457) For n_clusters = 3, the silhouette score is 0.45797710447228496) For n_clusters = 4, the silhouette score is 0.45434371948348606) For n_clusters = 5, the silhouette score is 0.40759857447931497) For n_clusters = 6, the silhouette score is 0.4194206033258803) For n_clusters = 7, the silhouette score is 0.4185436281362793) For n_clusters = 8, the silhouette score is 0.4146191398667421) For n_clusters = 9, the silhouette score is 0.38967567488717586) For n_clusters = 10, the silhouette score is 0.13143171655836627) For n_clusters = 11, the silhouette score is 0.12272712489085548) For n_clusters = 12, the silhouette score is 0.15775934538491893) For n_clusters = 13, the silhouette score is 0.16265801494654208) For n_clusters = 14, the silhouette score is 0.16545323314088417)
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(6, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 6 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(7, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 7 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(8, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 8 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(11, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 11 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(12, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 12 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# finding optimal no. of clusters with silhouette coefficients
visualizer = SilhouetteVisualizer(KMeans(13, random_state=1))
visualizer.fit(subset_scaled_df)
visualizer.show()
<AxesSubplot:title={'center':'Silhouette Plot of KMeans Clustering for 340 Samples in 13 Centers'}, xlabel='silhouette coefficient values', ylabel='cluster label'>
# let's take 8 as number of clusters
kmeans = KMeans(n_clusters=8, random_state=0)
kmeans.fit(subset_scaled_df)
KMeans(random_state=0)
# adding kmeans cluster labels to the original and scaled dataframes
data["K_means_segments"] = kmeans.labels_
subset_scaled_df["K_means_segments"] = kmeans.labels_
cluster_profile = data.groupby("K_means_segments").mean()
cluster_profile["count_in_each_segments"] = (
data.groupby("K_means_segments")["P_E Ratio"].count().values
)
# let's display cluster profiles
cluster_profile.style.highlight_max(color="lightgreen", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P_E Ratio | P_B Ratio | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| K_means_segments | ||||||||||||
| 0 | 508.534992 | 5.732177 | 1.504640 | 27.250000 | 150.875000 | 37895875.000000 | 1116994125.000000 | 15.965000 | 75654420.935000 | 43.727459 | 29.581664 | 8 |
| 1 | 80.152167 | 14.571437 | 1.829679 | 28.100000 | 321.850000 | 625929050.000000 | 942050500.000000 | 2.010500 | 790456335.265000 | 45.067457 | 8.307945 | 20 |
| 2 | 71.419603 | 4.948894 | 1.373081 | 25.337121 | 51.272727 | 19976181.818182 | 1588185079.545455 | 3.719489 | 438021401.544886 | 23.214179 | -3.335092 | 264 |
| 3 | 34.231808 | -15.515565 | 2.832069 | 48.037037 | 47.740741 | -128651518.518519 | -2444318518.518518 | -6.284444 | 503031539.057037 | 75.627265 | 1.655990 | 27 |
| 4 | 46.672222 | 5.166566 | 1.079367 | 25.000000 | 58.333333 | -3040666666.666667 | 14848444444.444445 | 3.435556 | 4564959946.222222 | 15.596051 | -6.354193 | 9 |
| 5 | 84.355716 | 3.854981 | 1.827670 | 633.571429 | 33.571429 | -568400000.000000 | -4968157142.857142 | -10.841429 | 398169036.442857 | 42.284541 | -11.589502 | 7 |
| 6 | 25.640000 | 11.237908 | 1.322355 | 12.500000 | 130.500000 | 16755500000.000000 | 13654000000.000000 | 3.295000 | 2791829362.100000 | 13.649696 | 1.508484 | 2 |
| 7 | 327.006671 | 21.917380 | 2.029752 | 4.000000 | 106.000000 | 698240666.666667 | 287547000.000000 | 0.750000 | 366763235.300000 | 400.989188 | -5.322376 | 3 |
plt.figure(figsize=(20, 40))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)
counter = 0
for i, ii in enumerate(range(11)):
plt.subplot(3, 5, i + 1)
sns.boxplot(
y=subset_scaled_df[num_col[counter]],
x=subset_scaled_df["K_means_segments"],
)
counter = counter + 1
plt.tight_layout()
plt.show()
plt.figure(figsize=(20, 40))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)
counter = 0
for i, ii in enumerate(range(11)):
plt.subplot(3, 5, i + 1)
sns.boxplot(
y=data[num_col[counter]],
x=data["K_means_segments"],
)
counter = counter + 1
plt.tight_layout()
plt.show()
# list of distance metrics
distance_metrics = ["euclidean", "chebyshev", "mahalanobis", "cityblock"]
# list of linkage methods
linkage_methods = ["single", "complete", "average", "weighted"]
high_cophenet_corr = 0
high_dm_lm = [0, 0]
for dm in distance_metrics:
for lm in linkage_methods:
Z = linkage(subset_scaled_df, metric=dm, method=lm)
c, coph_dists = cophenet(Z, pdist(subset_scaled_df))
print(
"Cophenetic correlation for {} distance and {} linkage is {}.".format(
dm.capitalize(), lm, c
)
)
if high_cophenet_corr < c:
high_cophenet_corr = c
high_dm_lm[0] = dm
high_dm_lm[1] = lm
Cophenetic correlation for Euclidean distance and single linkage is 0.9345526329379537. Cophenetic correlation for Euclidean distance and complete linkage is 0.8452929808109975. Cophenetic correlation for Euclidean distance and average linkage is 0.9495821453768725. Cophenetic correlation for Euclidean distance and weighted linkage is 0.9156327901892896. Cophenetic correlation for Chebyshev distance and single linkage is 0.9177074000650272. Cophenetic correlation for Chebyshev distance and complete linkage is 0.8031917698414331. Cophenetic correlation for Chebyshev distance and average linkage is 0.93724310450871. Cophenetic correlation for Chebyshev distance and weighted linkage is 0.8906157116060892. Cophenetic correlation for Mahalanobis distance and single linkage is 0.9306417886860311. Cophenetic correlation for Mahalanobis distance and complete linkage is 0.7932196483047811. Cophenetic correlation for Mahalanobis distance and average linkage is 0.9371656111212013. Cophenetic correlation for Mahalanobis distance and weighted linkage is 0.8437991848610089. Cophenetic correlation for Cityblock distance and single linkage is 0.9411400419314817. Cophenetic correlation for Cityblock distance and complete linkage is 0.8020658406850402. Cophenetic correlation for Cityblock distance and average linkage is 0.9338637157423139. Cophenetic correlation for Cityblock distance and weighted linkage is 0.7239584217766201.
# printing the combination of distance metric and linkage method with the highest cophenetic correlation
print(
"Highest cophenetic correlation is {}, which is obtained with {} distance and {} linkage.".format(
high_cophenet_corr, high_dm_lm[0].capitalize(), high_dm_lm[1]
)
)
Highest cophenetic correlation is 0.9495821453768725, which is obtained with Euclidean distance and average linkage.
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]
high_cophenet_corr = 0
high_dm_lm = [0, 0]
for lm in linkage_methods:
Z = linkage(subset_scaled_df, metric="euclidean", method=lm)
c, coph_dists = cophenet(Z, pdist(subset_scaled_df))
print("Cophenetic correlation for {} linkage is {}.".format(lm, c))
if high_cophenet_corr < c:
high_cophenet_corr = c
high_dm_lm[0] = "euclidean"
high_dm_lm[1] = lm
Cophenetic correlation for single linkage is 0.9345526329379537. Cophenetic correlation for complete linkage is 0.8452929808109975. Cophenetic correlation for average linkage is 0.9495821453768725. Cophenetic correlation for centroid linkage is 0.9471718133891771. Cophenetic correlation for ward linkage is 0.7095968268387225. Cophenetic correlation for weighted linkage is 0.9156327901892896.
# printing the combination of distance metric and linkage method with the highest cophenetic correlation
print(
"Highest cophenetic correlation is {}, which is obtained with {} linkage.".format(
high_cophenet_corr, high_dm_lm[1]
)
)
Highest cophenetic correlation is 0.9495821453768725, which is obtained with average linkage.
# list of linkage methods
linkage_methods = ["single", "complete", "average", "centroid", "ward", "weighted"]
# lists to save results of cophenetic correlation calculation
compare_cols = ["Linkage", "Cophenetic Coefficient"]
compare = []
# to create a subplot image
fig, axs = plt.subplots(len(linkage_methods), 1, figsize=(15, 30))
# We will enumerate through the list of linkage methods above
# For each linkage method, we will plot the dendrogram and calculate the cophenetic correlation
for i, method in enumerate(linkage_methods):
Z = linkage(subset_scaled_df, metric="euclidean", method=method)
dendrogram(Z, ax=axs[i])
axs[i].set_title(f"Dendrogram ({method.capitalize()} Linkage)")
coph_corr, coph_dist = cophenet(Z, pdist(subset_scaled_df))
axs[i].annotate(
f"Cophenetic\nCorrelation\n{coph_corr:0.2f}",
(0.80, 0.80),
xycoords="axes fraction",
)
compare.append([method, coph_corr])
# let's create a dataframe to compare cophenetic correlations for each linkage method
df_cc = pd.DataFrame(compare, columns=compare_cols)
df_cc
| Linkage | Cophenetic Coefficient | |
|---|---|---|
| 0 | single | 0.934553 |
| 1 | complete | 0.845293 |
| 2 | average | 0.949582 |
| 3 | centroid | 0.947172 |
| 4 | ward | 0.709597 |
| 5 | weighted | 0.915633 |
# list of distance metrics
distance_metrics = ["mahalanobis", "cityblock"]
# list of linkage methods
linkage_methods = ["average","weighted"]
# to create a subplot image
fig, axs = plt.subplots(
len(distance_metrics) + len(distance_metrics), 1, figsize=(10, 30)
)
i = 0
for dm in distance_metrics:
for lm in linkage_methods:
Z = linkage(subset_scaled_df, metric=dm, method=lm)
dendrogram(Z, ax=axs[i])
axs[i].set_title("Distance metric: {}\nLinkage: {}".format(dm.capitalize(), lm))
coph_corr, coph_dist = cophenet(Z, pdist(subset_scaled_df))
axs[i].annotate(
f"Cophenetic\nCorrelation\n{coph_corr:0.2f}",
(0.80, 0.80),
xycoords="axes fraction",
)
i += 1
HCmodel = AgglomerativeClustering(n_clusters=10, affinity="euclidean", linkage="ward")
HCmodel.fit(subset_scaled_df)
AgglomerativeClustering(n_clusters=10)
# adding hierarchical cluster labels to the original and scaled dataframes
subset_scaled_df["HC_Clusters"] = HCmodel.labels_
data["HC_Clusters"] = HCmodel.labels_
cluster_profile = data.groupby("HC_Clusters").mean()
cluster_profile["count_in_each_segments"] = (
data.groupby("HC_Clusters")["P_E Ratio"].count().values
)
# let's display cluster profiles
cluster_profile.style.highlight_max(color="lightgreen", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P_E Ratio | P_B Ratio | K_means_segments | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Clusters | |||||||||||||
| 0 | 72.526388 | 4.734948 | 1.399007 | 25.315018 | 53.172161 | 93306197.802198 | 1579848003.663004 | 3.751593 | 439517549.217985 | 23.360921 | -3.030339 | 1.989011 | 273 |
| 1 | 80.825491 | 16.770902 | 1.875115 | 27.588235 | 339.000000 | -65459941.176471 | 834377058.823529 | 1.880588 | 813555933.382353 | 45.029416 | 7.898655 | 1.000000 | 17 |
| 2 | 37.122609 | -16.298641 | 2.812537 | 55.086957 | 49.304348 | -429502043.478261 | -3019905391.304348 | -7.632609 | 498616643.819130 | 85.883230 | 1.216453 | 3.000000 | 23 |
| 3 | 429.569995 | 3.761200 | 1.565483 | 18.000000 | 174.400000 | 362405800.000000 | 496954600.000000 | 9.900000 | 59200701.450000 | 55.569866 | 46.627828 | 0.000000 | 5 |
| 4 | 46.672222 | 5.166566 | 1.079367 | 25.000000 | 58.333333 | -3040666666.666667 | 14848444444.444445 | 3.435556 | 4564959946.222222 | 15.596051 | -6.354193 | 4.000000 | 9 |
| 5 | 327.006671 | 21.917380 | 2.029752 | 4.000000 | 106.000000 | 698240666.666667 | 287547000.000000 | 0.750000 | 366763235.300000 | 400.989188 | -5.322376 | 7.000000 | 3 |
| 6 | 25.640000 | 11.237908 | 1.322355 | 12.500000 | 130.500000 | 16755500000.000000 | 13654000000.000000 | 3.295000 | 2791829362.100000 | 13.649696 | 1.508484 | 6.000000 | 2 |
| 7 | 24.485001 | -13.351992 | 3.482611 | 802.000000 | 51.000000 | -1292500000.000000 | -19106500000.000000 | -41.815000 | 519573983.250000 | 60.748608 | 1.565141 | 5.000000 | 2 |
| 8 | 108.304002 | 10.737770 | 1.165694 | 566.200000 | 26.600000 | -278760000.000000 | 687180000.000000 | 1.548000 | 349607057.720000 | 34.898915 | -16.851358 | 5.000000 | 5 |
| 9 | 1274.949951 | 3.190527 | 1.268340 | 29.000000 | 184.000000 | -1671386000.000000 | 2551360000.000000 | 50.090000 | 50935516.070000 | 25.453183 | -1.052429 | 0.000000 | 1 |
HCmodel = AgglomerativeClustering(n_clusters=8, affinity="euclidean", linkage="ward")
HCmodel.fit(subset_scaled_df)
AgglomerativeClustering(n_clusters=8)
# adding hierarchical cluster labels to the original and scaled dataframes
subset_scaled_df["HC_Clusters"] = HCmodel.labels_
data["HC_Clusters"] = HCmodel.labels_
cluster_profile = data.groupby("HC_Clusters").mean()
cluster_profile["count_in_each_segments"] = (
data.groupby("HC_Clusters")["P_E Ratio"].count().values
)
# let's display cluster profiles
cluster_profile.style.highlight_max(color="lightgreen", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P_E Ratio | P_B Ratio | K_means_segments | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Clusters | |||||||||||||
| 0 | 84.355716 | 3.854981 | 1.827670 | 633.571429 | 33.571429 | -568400000.000000 | -4968157142.857142 | -10.841429 | 398169036.442857 | 42.284541 | -11.589502 | 5.000000 | 7 |
| 1 | 160.085605 | 13.814152 | 1.804744 | 25.409091 | 301.590909 | 31782272.727273 | 757690136.363636 | 3.703182 | 642111562.488636 | 47.424973 | 16.700740 | 0.772727 | 22 |
| 2 | 46.672222 | 5.166566 | 1.079367 | 25.000000 | 58.333333 | -3040666666.666667 | 14848444444.444445 | 3.435556 | 4564959946.222222 | 15.596051 | -6.354193 | 4.000000 | 9 |
| 3 | 37.122609 | -16.298641 | 2.812537 | 55.086957 | 49.304348 | -429502043.478261 | -3019905391.304348 | -7.632609 | 498616643.819130 | 85.883230 | 1.216453 | 3.000000 | 23 |
| 4 | 72.526388 | 4.734948 | 1.399007 | 25.315018 | 53.172161 | 93306197.802198 | 1579848003.663004 | 3.751593 | 439517549.217985 | 23.360921 | -3.030339 | 1.989011 | 273 |
| 5 | 327.006671 | 21.917380 | 2.029752 | 4.000000 | 106.000000 | 698240666.666667 | 287547000.000000 | 0.750000 | 366763235.300000 | 400.989188 | -5.322376 | 7.000000 | 3 |
| 6 | 1274.949951 | 3.190527 | 1.268340 | 29.000000 | 184.000000 | -1671386000.000000 | 2551360000.000000 | 50.090000 | 50935516.070000 | 25.453183 | -1.052429 | 0.000000 | 1 |
| 7 | 25.640000 | 11.237908 | 1.322355 | 12.500000 | 130.500000 | 16755500000.000000 | 13654000000.000000 | 3.295000 | 2791829362.100000 | 13.649696 | 1.508484 | 6.000000 | 2 |
HCmodel = AgglomerativeClustering(n_clusters=7, affinity="euclidean", linkage="ward")
HCmodel.fit(subset_scaled_df)
AgglomerativeClustering(n_clusters=7)
# adding hierarchical cluster labels to the original and scaled dataframes
subset_scaled_df["HC_Clusters"] = HCmodel.labels_
data["HC_Clusters"] = HCmodel.labels_
cluster_profile = data.groupby("HC_Clusters").mean()
cluster_profile["count_in_each_segments"] = (
data.groupby("HC_Clusters")["P_E Ratio"].count().values
)
# let's display cluster profiles
cluster_profile.style.highlight_max(color="lightgreen", axis=0)
| Current Price | Price Change | Volatility | ROE | Cash Ratio | Net Cash Flow | Net Income | Earnings Per Share | Estimated Shares Outstanding | P_E Ratio | P_B Ratio | K_means_segments | count_in_each_segments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HC_Clusters | |||||||||||||
| 0 | 208.557968 | 13.352255 | 1.781422 | 25.565217 | 296.478261 | -42268521.739130 | 835675782.608696 | 5.720000 | 616408256.122609 | 46.469678 | 15.928863 | 0.739130 | 23 |
| 1 | 84.355716 | 3.854981 | 1.827670 | 633.571429 | 33.571429 | -568400000.000000 | -4968157142.857142 | -10.841429 | 398169036.442857 | 42.284541 | -11.589502 | 5.000000 | 7 |
| 2 | 37.122609 | -16.298641 | 2.812537 | 55.086957 | 49.304348 | -429502043.478261 | -3019905391.304348 | -7.632609 | 498616643.819130 | 85.883230 | 1.216453 | 3.000000 | 23 |
| 3 | 46.672222 | 5.166566 | 1.079367 | 25.000000 | 58.333333 | -3040666666.666667 | 14848444444.444445 | 3.435556 | 4564959946.222222 | 15.596051 | -6.354193 | 4.000000 | 9 |
| 4 | 72.526388 | 4.734948 | 1.399007 | 25.315018 | 53.172161 | 93306197.802198 | 1579848003.663004 | 3.751593 | 439517549.217985 | 23.360921 | -3.030339 | 1.989011 | 273 |
| 5 | 327.006671 | 21.917380 | 2.029752 | 4.000000 | 106.000000 | 698240666.666667 | 287547000.000000 | 0.750000 | 366763235.300000 | 400.989188 | -5.322376 | 7.000000 | 3 |
| 6 | 25.640000 | 11.237908 | 1.322355 | 12.500000 | 130.500000 | 16755500000.000000 | 13654000000.000000 | 3.295000 | 2791829362.100000 | 13.649696 | 1.508484 | 6.000000 | 2 |
plt.figure(figsize=(20, 40))
plt.suptitle("Boxplot of scaled numerical variables for each cluster", fontsize=20)
counter = 0
for i, ii in enumerate(range(11)):
plt.subplot(3, 5, i + 1)
sns.boxplot(
y=subset_scaled_df[num_col[counter]],
x=subset_scaled_df["HC_Clusters"],
)
counter = counter + 1
plt.tight_layout()
plt.figure(figsize=(20, 40))
plt.suptitle("Boxplot of original numerical variables for each cluster", fontsize=20)
counter = 0
for i, ii in enumerate(range(11)):
plt.subplot(3, 5, i + 1)
sns.boxplot(
y=data[num_col[counter]],
x=data["HC_Clusters"],
)
counter = counter + 1
plt.tight_layout()
# importing library
from sklearn.decomposition import PCA
# setting the number of components to 2
pca = PCA(n_components=2)
# transforming data and storing results in a dataframe
X_reduced_pca = pca.fit_transform(subset_scaled_df)
reduced_df_pca = pd.DataFrame(
data=X_reduced_pca, columns=["Component 1", "Component 2"]
)
# checking the amount of variance explained
pca.explained_variance_ratio_.sum()
0.37836198332699145
sns.scatterplot(data=reduced_df_pca, x="Component 1", y="Component 2")
<AxesSubplot:xlabel='Component 1', ylabel='Component 2'>
sns.scatterplot(
data=reduced_df_pca,
x="Component 1",
y="Component 2",
hue=data["HC_Clusters"],
palette="rainbow",
)
plt.legend(bbox_to_anchor=(1, 1))
<matplotlib.legend.Legend at 0x29e8c289340>